26 research outputs found
Multimodal Computational Attention for Scene Understanding
Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models
Focusing computational visual attention in multi-modal human-robot interaction
Identifying verbally and non-verbally referred-to objects is an im-portant aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from point-ing gestures with contextual knowledge about the visual appear-ance of the referred-to object obtained from language. The avail-able information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation
Saliency-based identification and recognition of pointed-at objects
Abstract — When persons interact, non-verbal cues are used to direct the attention of persons towards objects of interest. Achieving joint attention this way is an important aspect of natural communication. Most importantly, it allows to couple verbal descriptions with the visual appearance of objects, if the referred-to object is non-verbally indicated. In this contri-bution, we present a system that utilizes bottom-up saliency and pointing gestures to efficiently identify pointed-at objects. Furthermore, the system focuses the visual attention by steering a pan-tilt-zoom camera towards the object of interest and thus provides a suitable model-view for SIFT-based recognition and learning. We demonstrate the practical applicability of the proposed system through experimental evaluation in different environments with multiple pointers and objects
On the Distribution of Salient Objects in Web Images and its Influence on Salient Object Detection
It has become apparent that a Gaussian center bias can serve as an important
prior for visual saliency detection, which has been demonstrated for predicting
human eye fixations and salient object detection. Tseng et al. have shown that
the photographer's tendency to place interesting objects in the center is a
likely cause for the center bias of eye fixations. We investigate the influence
of the photographer's center bias on salient object detection, extending our
previous work. We show that the centroid locations of salient objects in
photographs of Achanta and Liu's data set in fact correlate strongly with a
Gaussian model. This is an important insight, because it provides an empirical
motivation and justification for the integration of such a center bias in
salient object detection algorithms and helps to understand why Gaussian models
are so effective. To assess the influence of the center bias on salient object
detection, we integrate an explicit Gaussian center bias model into two
state-of-the-art salient object detection algorithms. This way, first, we
quantify the influence of the Gaussian center bias on pixel- and segment-based
salient object detection. Second, we improve the performance in terms of F1
score, Fb score, area under the recall-precision curve, area under the receiver
operating characteristic curve, and hit-rate on the well-known data set by
Achanta and Liu. Third, by debiasing Cheng et al.'s region contrast model, we
exemplarily demonstrate that implicit center biases are partially responsible
for the outstanding performance of state-of-the-art algorithms. Last but not
least, as a result of debiasing Cheng et al.'s algorithm, we introduce a
non-biased salient object detection method, which is of interest for
applications in which the image data is not likely to have a photographer's
center bias (e.g., image data of surveillance cameras or autonomous robots)
Learning Colour Representations of Search Queries
Image search engines rely on appropriately designed ranking features that
capture various aspects of the content semantics as well as the historic
popularity. In this work, we consider the role of colour in this relevance
matching process. Our work is motivated by the observation that a significant
fraction of user queries have an inherent colour associated with them. While
some queries contain explicit colour mentions (such as 'black car' and 'yellow
daisies'), other queries have implicit notions of colour (such as 'sky' and
'grass'). Furthermore, grounding queries in colour is not a mapping to a single
colour, but a distribution in colour space. For instance, a search for 'trees'
tends to have a bimodal distribution around the colours green and brown. We
leverage historical clickthrough data to produce a colour representation for
search queries and propose a recurrent neural network architecture to encode
unseen queries into colour space. We also show how this embedding can be learnt
alongside a cross-modal relevance ranker from impression logs where a subset of
the result images were clicked. We demonstrate that the use of a query-image
colour distance feature leads to an improvement in the ranker performance as
measured by users' preferences of clicked versus skipped images.Comment: Accepted as a full paper at SIGIR 202
Multimodal computational attention for scene understanding and robotics
This book presents state-of-the-art computational attention models that have been successfully tested in diverse application areas and can build the foundation for artificial systems to efficiently explore, analyze, and understand natural scenes. It gives a comprehensive overview of the most recent computational attention models for processing visual and acoustic input. It covers the biological background of visual and auditory attention, as well as bottom-up and top-down attentional mechanisms and discusses various applications. In the first part new approaches for bottom-up visual and acoustic saliency models are presented and applied to the task of audio-visual scene exploration of a robot. In the second part the influence of top-down cues for attention modeling is investigated.